Reconstructing Ancient Literary Texts from Noisy Manuscripts
نویسندگان
چکیده
Given multiple corrupted versions of the same text, as is common with ancient manuscripts, we wish to reconstruct the original text from which the extant corrupted versions were copied (typically via latent intermediary versions). This is a challenge of cardinal importance in the humanities. We use a variant of expectation-maximization (EM), to solve this problem. We prove the efficacy of our method on both synthetic and real-world data.
منابع مشابه
A Computational Model of Text Reuse in Ancient Literary Texts
We propose a computational model of text reuse tailored for ancient literary texts, available to us often only in small and noisy samples. The model takes into account source alternation patterns, so as to be able to align even sentences with low surface similarity. We demonstrate its ability to characterize text reuse in the Greek New Testament.
متن کاملReconstructing the Horizon Expectation of the Decanters in Safavid Era Age
Surāhī (decanter) can be regarded as one of the most common types of drinking vessels in Persian art as well as one of the containers mentioned most in Persian literature (especially in mystical literature). In this study, in order to complete this information in the Safavid era, in both domains, the method of reconstructing the horizons of Hans Robert Jauss was employed. Explaining the theoret...
متن کاملLiterary Figures in Gāthic Texts
Introduction Gāthic texts are a collection of religious songs of Zarothustra who lived about 1200 BC. Of the seventy two hāts (stanzas) of Yasna (one of the five chapters of Avesta), seventeen hāts belong to five Gāthas. These seventeen hāts have been classified into five categories based on their syllabic meter and the number of the song: 1) ahunavaiti, 2) ushtavaiti, 3)spanta.mainyu, ...
متن کاملComputational Methods for Coptic: Developing and Using Part-of-Speech Tagging for Digital Scholarship in the Humanities
This paper motivates and details the first implementation of a freely available part of speech tag set and tagger for Coptic. Coptic is the last phase of the Egyptian language family and a descendent of the hieroglyphs of ancient Egypt. Unlike classical Greek and Latin, few resources for digital and computational work have existed for ancient Egyptian language and literature until now. We evalu...
متن کاملComputational Methods for Coptic
This paper motivates and details the first implementation of a freely available part of speech tag set and tagger for Coptic. Coptic is the last phase of the Egyptian language family and a descendant of the hieroglyphs of ancient Egypt. Unlike classical Greek and Latin, few resources for digital and computational work have existed for ancient Egyptian language and literature until now. We evalu...
متن کامل